Chen Zhong, Mingyi Zhao, Gaoyao
Xiao, Jun Xu
Pennsylvania State University
czz111@psu.edu
PRIMARY
Student Team: YES
D3
Google
BigQuery
Excel
Mind42
ARSCA, an
analytical reasoning support tool for Cyber Analysis, developed by S2 Research
Lab, PSU http://s2.ist.psu.edu/paper/paper42-Zhong-ISI2013-final.pdf
May we post your submission in the Visual Analytics Benchmark
Repository after VAST Challenge 2013 is complete? YES
Video:
http://personal.psu.edu/czz111/VAST/VAST2013/psu-zhong-mc3-vedio.wmv
http://personal.psu.edu/czz111/VAST/VAST2013/psu-zhong-mc3-vedio.swf
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC3.1 – Provide a timeline (i.e., events organized
in chronological order) of the notable events that occur in Big Marketing’s
computer networks for the two weeks of supplied data. Use all data at your
disposal to identify up to twelve events and describe them to the extent
possible. Your answer should be no more
than 1000 words long and may contain up to twelve images.
Event 1: Periodic UPnp multicast
Time:
7:00-8:00 from 2013-04-01 to 2013-04-15.
Description:
Beginning on 2013-04-01 7:00 to
8:00, most workstations in the organizational network periodically send a
multicast SSDP package to the UPnp multicast address (239.255.255.250:1900).
The vulnerability of UPnp made inner workstations (especially those in subnet
172.10.1.*) exposed to the ips outside the network. The periodic multicast is
shown in Figure 1. It is a
timeline-based heatmap (called “Timeline-Heatmap”), with time as x-axis and
source ip as y-axis. The color of each box refers to the multicast number from
the corresponding ip in a certain hour (yellow is 0, black is maximum).
Figure 1. Timeline-Heatmap: Number of
multicast packages per hour from inner network ips |
Scale: yellow =0, black = 200 |
Figure 2. Timeline-Heatmap: Number of
connections with 10.199.250.2 per hour |
Scale: yellow =0, black = 20 |
Event 2: FTP connections from outside ip
10.199.250.2 to workstations in subnet 172.10.1.* and the administrative
workstation
Time: 2013-04-01 8:30 – 16:30;
2013-04-02 10:11, 11:52; 2013-04-03
6:00, 12:23; 2013-04-05 7:24, 10:08.
Description:
After the Upnp multicast (began
around 2013-04-01 8:30), workstations in subnet 172.10.1.* notify 10.199.250.2
their information (e.g. configuration) using tcp connections. Then,
10.199.250.2 responded several workstations in subnet 172.10.1.*
Beginning on 2013-04-02 10:00,
10.199.250.2 was able to build FTP connection with (some payload) the
administrative workstation (172.10.0.40). These connections are shown in the
first row of Figure 2. The network
behavior of the administrative workstation is shown in the Timeline-Heatmap
Figure 3. It mainly kept broadcasting to the network. 10.199.250.2 may change
the its broadcasting UDP package in order to contact other workstations in
172.10.1.*.
Figure 3. Timeline-Heatmap: Number of
connections with the administrative workstation (172.10.0.40) in week 1 and
week 2 |
Scale: yellow =0, black = 80 |
Event 3: DC Server’s update
Time: 2013-04-01 8:30, 2013-04-02 6:00,
Description:
In the third Timeline-Heatmap in
Figure 4, DC 03 used LLMNR protocol to send name resolution query to
224.0.0.252 around 2013-04-01 8:30.
After this update, DC 03 can be
connected by outside hosts 10.6.6.6 and 10.7.7.10. Beginning at 2013-04-02
6:00, all these DCs (DC01, DC02, DC03) began to continuous sent UDP package to
the switch 172.0.0.1. Around 2013-04-02 7:00, DC01 sent a UDP package to a fake
destination (192.168.3.4).
Figure 4. Timeline-Heatmap: Number of
connections with DC Server in two weeks |
Scale: yellow =0, black = 80 |
Event 4: 10.0.3.77 kept sending mails to Mail
Server 01 through the two weeks.
Time: Began at 2013-04-02 8:00, through two
weeks. 2013-04-02 10:00
Description:
Figure 4 is the Timeline-Heatmap
for 10.0.3.77. Its connection with Mail Server 01 (172.10.0.3) is shown in the
first line. Using Excel to open the filtered data, it shows continuous
connections via port 25 every 2 or 3 minutes. It could be spam attack. Around
2013-04-02 10:00, BigBrother reported Mail Server 01 was in problematic status.
Figure 5. Timeline-Heatmap: Number of
connections with the 10.0.3.77 in week 1 and week 2 |
Scale: yellow =0, black = 100 |
Event 5: Large number of connection from
multiple outside hosts to Web Server 03. (DOS attack)
Time: 2013-04-03
Description:
2013-04-03 9:00 – 12:00, outside
hosts (10.9.81.5) intensively sent connection request to the Web Server 03
(about 100,000 connections per hour). The three-hour connection from 10.9.81.5
to 172.30.0.4 (Web Server 03) is shown in the middle of Figure 5.
Bigbrother report (Figure 6)
shows Web Server 03 stopped working after 2013-04-03 12:46. It was restarted on
2013-04-05 8:31.
Figure 5. Timeline-Heatmap: Number of
connections with the 10.9.81.5 in two weeks |
Scale: yellow =0, black = 100000 |
Figure 6. The status of Web Server 03
reported by BigBrother |
|
Event 6: 10.9.81.5 launched port scanning
attack to all the Servers in the network.
Time: Began at
2013-04-06 12:00
Description:
Figure 7, from left to right, shows the timeline that
10.9.81.5 scanned subnet 1, 2, and 3. It shows subnet 1 was scanned at
2013-04-06 12:00 and 2013-04-07 3:00, while subnet 2, and 3 are only
scanned at 2013-04-06 12:00.
Figure 7. Timeline of number of scanned
port of subnet 1, 2, 3 by 10.9.81.5 |
|
Event 7: Remote Desktop Login
Time: 2013-04-07 11:00
Description:
After 10.9.81.5 did port scanning
to the servers, it mounted Remote Desktop Login attack to the webservers:
172.10.0.4, 172.10.0.9, 172.10.0.5, 172.20.0.6, 172.30.0.7. We suspect that the
attacker uses remote desktop connections to control these servers. RDP
connections can also be observed in week 2. Figure
8 displays the change of the number of RDP connection per hour.
Figure 8. RDP Connection Timeline |
|
Event 8: Large amount of Denied IPS
connection attempts
Time: begin at 2013-04-11 12:00
Description:
We observe many deny entries in
the IPS log in Figure 9. After checking the data in Excel, we conclude that
these deny entries record failed port scan activities.
Figure 9. IPS Warning(Deny) Entry Timeline |
|
Event 9: SSH connections
Time: 2013-04-12-9:00
Description:
Many outbound SSH connections
started to appear and they last till the end of week 2. We identified 8
workstations as sources and 1 outside IP address which could be a C&C
botnet server. We believe that these workstations are infected. And they are
exfiltrating data out to the attacker.
Figure 10. SSH
Connection Timeline |
|
Event
10: The workstations in subnet 172.10.1.* request large payload from outside
server 10.1.0.100,
Time:
2013-04-14
Description:
2013-04-13 6:30-7:30 172.10.1.*
began to request large payload from outside server 10.1.0.100.
2013-04-13 23:30-2013-04-14 2:00
172.10.1.* continued downloading from 10.1.0.100.
2013-04-14 6:30-7:30 172.10.1.*
continued downloading from 10.1.0.100.
2013-04-14 9:00- 10:00 172.10.1.*
continued downloading and finished.
Event
11: Eight workstations are utilized as bots to launch DDos Attack to outside
host/server 10.1.0.100
Time: 2013-04-13
7:00-8:00 2013-04-14 7:00-8:00
Description:
Eight workstations (Shown as the
X-axis in Figure 11) has a large number of connections to 10.1.0.100 (small
payload) in the two time slots.
Figure 11. Timeline-Heatmap: Number of
connections to 10.1.0.100 in two weeks |
scale:
yellow = 0, black = 100,000 |
Event 12:
10.6.6.7 launched a Dos attack to DC servers.
Time: 2013-04-15 9:00-10:00
Description:
The right side of Figure 4. shows
that 10.6.6.7 intensively connect to these three DC servers. Considering the
payload is not large and the source ports are almost different, it could be a
Dos attack.
MC3.2 – Speculate on one or more narratives that
describe the events on the network. Provide a list of analytic hypotheses
and/or unanswered questions about the notable events. In other words, if you
were to hand off your timeline to an analyst who will conduct further investigation,
what confirmations and/or answers would you like to see in their report back to
you? Your answer should be no more than 300 words long and may contain up to
three additional images.
Narrative 1. Attacker conducted port scan against Big
Marketing’s servers and found vulnerabilities. By exploiting these
vulnerabilities, attacker was able to use Remote Desktop Protocol (RDP) to
login victim servers and install malware. The RDP started from 2013-04-07
11:00. Since these web servers are visited by
Big Marketing’s workstations, implanted malware will spread to
workstations. And starting at 2013-04-12-9:00, bots (infected workstations)
started using SSH to communication with C&C server in order to receive
instructions or exfiltrating sensitive data of the Big Marketing company.
Narrative 2. Attacker utilizes UPnP vulnerabilities. Due
to the vulnerabilities, some workstations were exposed to the outside
internet. Outside C&C server
controlled the workstations and install rootkit on it. After the C&C got more information of the network, it
conduct port scan against servers. The remaining process is similar to the
corresponding part of Narrative 1.
Narrative 3. Some workstation was infected by
drive-by-download attack. Attacker(10.0.3.77) sent spear phishing mails through
the SMTP server to Big Marketing’s employees. Some employees read and clicked
links in the malicious emails and their workstations were infected by
attacker’s malware. The remaining process is similar to the corresponding part
of Narrative
Hypotheses and Unanswered Questions
1. We highly suspect the RDP
connections and SSH connections in the log and consider them as attacker’s
activity. But they could be legitimate because the IPS allows these type of
traffic. So the analyst should further investigate these connections and see
whether they are malicious.
2. We found that Bigbrother
report (Figure 6) shows Web Server 03 stopped working after 2013-04-03 12:46.
It was restarted on 2013-04-05 8:31. Was this server closed by the
administrator?
3. We mentioned there are eight victims
in Event 11. Who are they? What asset do they have? Are they key node in the
network? The analyst need to look into the configuration.
4. Is 10.1.0.100 a malicious
server or a legitimate client of BigMarketing? Why the inner workstations
receive large payload from it? The analyst need to gather and analyze the data
package information.
MC3.3 – Describe the role that your visual
analytics played in enabling discovery of the notable events in MC3.1. Describe
whether your visual analytics play a role in formulating the questions in
MC3.2. Your answer should be no more than 300 words long and may contain up to
three additional images.
We can say that our visual
analytics is the visualized analytical reasoning process recorded in Mind42
(called AOH-Map).
Figure 12. Partial AOH-Map. Public link: http://mind42.com/public/431032f5-4a12-4fd8-ad93-de9462a463fa. |
|
There are several advantages of
visualizing the analysis process:
Firstly, it’s convenient for us
to introduce new ideas to each other and share new findings by recording our
analysis process with the same representation.
We represent our analytical process as an iterative cycle involving
three components: action, observation and hypothesis. A new hypothesis is
generated based on an existing observation. In order to verify the hypothesis,
further actions need to be done and will lead to new observations. Reflected in the AOH-Map, it’s a tree
structure.
Secondly, it enables us to divide
our work based on the “hypothesis” and conduct hypothesis-based collaboration.
More specifically, each of us create new hypothesis based on existing
observations. In order to verify the hypotheses, further actions are needed.
Each of us choose a hypothesis to work on. Then, we record our actions and new
observations. Based on the new observations, new hypotheses could be generated.
According to our practice, the hypothesis-based collaboration is very
effective.
Therefore, to answer M3.3, each notable events are summarized based on
the observations we recorded during the analysis process. The answer for MC3.2
is also directly reflected by the hypothesis in the AOH-Map. Since each action is aimed to verify a
hypothesis, any visualization we developed is driven by a specific goal. Thus,
we propose the “visualization function” idea: instead of developing or
leveraging a complex and integrated visualization tool, we only have these
refined functions:
• (PERL FILTER_IP, datasource,
ip) // perl script to filter big data
• (PERL COUNT_HOURLY, datasource,
field) // perl script to aggregate big data
• (EXCEL HISTOGRAM field) // draw excel histogram
• (EXCEL FILTER field) // use excel filter
• (EXCEL SORT field) // use excel to sort
• (BIGQUERY table sql) // run SQL query on Google’s BigQuery
• (BIGQUERY GOOGLEVIZ sql,
chartscript) // Google script that utilize GoogleViz, (e.g.
Figure 7)
• (D3-TIME-HEATMAP, ip, field) // javascript using D3 package to draw
Timeline-Heatmap for time series analysis (e.g. Figure 1)
Although each “visualization
function” is simple, it becomes convenient for us to use various combinations
of them according to our particular needs. Moreover, the existing
“visualization function” is also an intermediate result of the analysis
procdess, because others can reuse it by applying it to another data set. Our
practice at this time shows that the current eight functions can give us
powerful support for our analysis.